Skip to content

Parallelize task instance gather and support --gather on Modal#199

Merged
AlienKevin merged 40 commits intoSWE-bench:mainfrom
AlienKevin:kevin/bug-gen-gather
Mar 9, 2026
Merged

Parallelize task instance gather and support --gather on Modal#199
AlienKevin merged 40 commits intoSWE-bench:mainfrom
AlienKevin:kevin/bug-gen-gather

Conversation

@AlienKevin
Copy link
Copy Markdown
Collaborator

@AlienKevin AlienKevin commented Jan 16, 2026

This PR introduces parallel processing to the task instance gathering phase to significantly improve performance for large datasets and adds support for the gather phase in the Modal workflow.

Key Changes:

  • Parallel Gathering (swesmith/harness/gather.py):
    Before this PR, large repos like math.js with >800 task instances timed out after 20 minutes due to slow, sequential git branch creation and git push. After this PR, large repos finish in minutes.
    • Implemented ProcessPoolExecutor to process task instances in parallel, utilizing multiple cores.
    • Added unique, PID-based clone paths (e.g., repo_name_pid_subfolder) to prevent race conditions during concurrent Git operations.
    • Refactored the main loop into a process_instance worker function.
  • Modal Support (scripts/bug_gen_modal.py):
    • Support task instance gathering with a --gather CLI flag (skipping generation/validation).

Question: do we want to fix the FAIL_TO_PASS to PASS_TO_FAIL?:
swe-smith currently uses FAIL_TO_PASS for tests that pass before the bug patch but fails afterwards, which inverts the semantic and causes confusion. A more intuitive name would be PASS_TO_FAIL so I used this convention in this PR. However, if we are to adopt this new convention, the rest of the code and datasets need to be updated, so I'm not sure whether it's worth it?

Resolution: Flip PASS_TO_FAIL to FAIL_TO_PASS in alignment with SWE-bench naming convention when outputing the task instance jsons.

Test command

uv run modal run scripts/bug_gen_modal.py --language javascript --gather &> gather.log

AlienKevin and others added 8 commits January 16, 2026 08:56
Previously, the script would fail if `git commit` was attempted with no changes. This was observed in cases like `Automattic__mongoose.5f57a5bb` where the applied patch resulted in no tracked changes. Now, we check `git status --porcelain` before committing and skip the instance if no changes are detected.
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
swesmith/profiles/base.py 82.82% <100.00%> (+0.09%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Switch from per-task clones to per-worker persistent repositories.
- Reduces clone operations from O(tasks) to O(workers) (e.g. 1400 -> 17).
- Eliminates file locking race conditions.
- Total gather time for Javascript is now ~5 minutes (bottlenecked by math.js).
@AlienKevin AlienKevin force-pushed the kevin/bug-gen-gather branch from 8cefa3c to e42a5e2 Compare January 17, 2026 07:35
@AlienKevin AlienKevin force-pushed the kevin/bug-gen-gather branch from 98898a8 to 302b73e Compare January 17, 2026 07:39
@AlienKevin AlienKevin force-pushed the kevin/bug-gen-gather branch from 52ba582 to f7f68cb Compare January 20, 2026 08:20
pre-commit-ci bot and others added 8 commits January 20, 2026 08:20
Cause:
- gather invoked apply commands with a relative patch path (`../logs/run_validation/.../patch.diff`).
- During modal gather, each worker runs from a temporary repo directory under `/tmp/...`, so that relative path did not resolve to the mounted logs directory.
- `git apply` and fallback `patch` both failed with "can't open patch ... No such file or directory", resulting in dropped instances and empty/underfilled task outputs.

Fix:
- Resolve `patch.diff` to an absolute path before apply.
- Shell-quote that absolute path and pass it to every command in `GIT_APPLY_CMDS`.

Result:
- Patch application no longer depends on worker cwd; gather can apply valid rust patches and produce task instances consistently.
Root cause: upload_tasks_to_hf_modal.py was hardcoded to javascript paths in both task discovery and per-repo processing. Running with --language rust still read /data/javascript/... and javascript/task_insts, which breaks Rust upload workflows and can surface as missing/empty problem statements for non-JS datasets.

Fix: thread a language argument through the local entrypoint and worker function, list files from {language}/task_insts, and pass language explicitly through process_repo.map so each worker reads /data/{language}/task_insts and /data/{language}/issue_gen.
@AlienKevin AlienKevin merged commit 9f2ba94 into SWE-bench:main Mar 9, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant